Phase-Parametric Policies for Reinforcement Learning in Cyclic Environments
نویسندگان
چکیده
In many reinforcement learning problems, parameters of the model may vary with its phase while the agent attempts to learn through its interaction with the environment. For example, an autonomous car’s reward on selecting a path may depend on traffic conditions at the time of the day or the transition dynamics of a drone may depend on the current wind direction. Many such processes exhibit a cyclic phase-structure and could be represented with a control policy parameterized over a circular or cyclic phase space. Attempting to model such phase variations with a standard data-driven approach (e.g. deep networks) without explicitly modeling the phase of the model can be challenging. Ambiguities may arise as the optimal action for a given state can vary depending on the phase. To better model cyclic environments, we propose phase-parameterized policies and value function approximators that explicitly enforce a cyclic structure to the policy or value space. We apply our phase-parameterized reinforcement learning approach to both feed-forward and recurrent deep networks in the context of trajectory optimization and locomotion problems. Our experiments show that our proposed approach has superior modeling performance than traditional function approximators in cyclic environments.
منابع مشابه
Multi-Agent Deep Reinforcement Learning
This work introduces a novel approach for solving reinforcement learning problems in multi-agent settings. We propose a state reformulation of multi-agent problems in R that allows the system state to be represented in an image-like fashion. We then apply deep reinforcement learning techniques with a convolution neural network as the Q-value function approximator to learn distributed multi-agen...
متن کاملBalancing Learning and Engagement in Game-Based Learning Environments with Multi-objective Reinforcement Learning
Game-based learning environments create rich learning experiences that are both effective and engaging. Recent years have seen growing interest in data-driven techniques for tutorial planning, which dynamically personalize learning experiences by providing hints, feedback, and problem scenarios at runtime. In game-based learning environments, tutorial planners are designed to adapt gameplay eve...
متن کاملSubgoal Discovery for Hierarchical Reinforcement Learning Using Learned Policies
Reinforcement learning addresses the problem of learning to select actions in order to maximize an agent’s performance in unknown environments. To scale reinforcement learning to complex real-world tasks, agent must be able to discover hierarchical structures within their learning and control systems. This paper presents a method by which a reinforcement learning agent can discover subgoals wit...
متن کاملPolicy Improvement for several Environments
In this paper we state a generalized form of the policy improvement algorithm for reinforcement learning. This new algorithm can be used to ...nd stochastic policies that optimize single-agent behavior for several environments and reinforcement functions simultaneously. We ...rst introduce a geometric interpretation of policy improvement, de...ne a framework to apply one policy to several envir...
متن کاملPolicy Improvement for several Environments Extended Version
In this paper we state a generalized form of the policy improvement algorithm for reinforcement learning. This new algorithm can be used to ...nd stochastic policies that optimize single-agent behavior for several environments and reinforcement functions simultaneously. We ...rst introduce a geometric interpretation of policy improvement, de...ne a framework to apply one policy to several envir...
متن کامل